Integration of ASR and machine translation models in a document translation task
نویسندگان
چکیده
This paper is concerned with the problem of machine aided human language translation. It addresses a translation scenario where a human translator dictates the spoken language translation of a source language text into an automatic speech dictation system. The source language text in this scenario is also presented to a statistical machine translation system (SMT). The techniques presented in the paper assume that the optimum target language word string which is produced by the dictation system is modeled using the combined SMT and ASR statistical models. These techniques were evaluated on a speech corpus involving human translators dictating English language translations of French language text obtained from transcriptions of the proceedings of the Canadian House of Commons. It will be shown in the paper that the combined ASR/SMT modeling techniques described in the paper were able to reduce ASR WER by 26.6 percent relative to the WER of an ASR system that did not incorporate SMT knowledge.
منابع مشابه
Incorporating Knowledge of Source Language Text in a System for Dictation of Document Translations
This paper describes methods for integrating source language and target language information for machine aided human translation (MAHT) of text documents. These methods are applied to a language translation task involving a human translator dictating a first draft translation of a source language document. A method is presented which integrates target language automatic speech recognition (ASR)...
متن کاملTowards domain independence in machine aided human translation
This paper presents an approach for integrating statistical machine translation and automatic speech recognition for machine aided human translation (MAHT). It is applied to the problem of improving ASR performance for a human translator dictating translations in a target language while reading from a source language document. The approach addresses the issues associated with task independent A...
متن کاملIntegration of Speech to Computer-Assisted Translation Using Finite-State Automata
State-of-the-art computer-assisted translation engines are based on a statistical prediction engine, which interactively provides completions to what a human translator types. The integration of human speech into a computer-assisted system is also a challenging area and is the aim of this paper. So far, only a few methods for integrating statistical machine translation (MT) models with automati...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملOn the Translation Quality of Google Translate: With a Concentration on Adjectives
Translation, whose first traces date back at least to 3000 BC (Newmark, 1988), has always been considered time-consuming and labor-consuming. In view of this, experts have made numerous efforts to develop some mechanical systems which can reduce part of this time and labor. The advancement of computers in the second half of the twentieth century paved the ground for the invention of machine tra...
متن کامل